Skip to content

Conversation

@gabe-l-hart
Copy link
Collaborator

Previously, llama-eval-callback's printed sum was only the sum of the values that got printed since the printing function compacted its view by jumping indices. This PR makes a double pass over each tensor, once iterating all values to compute the sum and once as before to create the printed view. This makes it much easier to compare between llama.cpp and transformers.

… printed values

This makes it much easier to compare between llama.cpp and transformers!

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>
@gabe-l-hart
Copy link
Collaborator Author

@ggerganov I'm assigning you on this since you gave a 👍 to my comment on the other PR, but feel free to forward

@gabe-l-hart gabe-l-hart merged commit a8bca68 into ggml-org:master Aug 28, 2025
48 checks passed
@gabe-l-hart gabe-l-hart deleted the gabe-l-hart/llama-eval-callback-sum branch August 28, 2025 20:27
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 29, 2025
… printed values (ggml-org#15637)

This makes it much easier to compare between llama.cpp and transformers!

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants